{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Create an imbalanced dataset\n\nAn illustration of the :func:`~imblearn.datasets.make_imbalance` function to\ncreate an imbalanced dataset from a balanced dataset. We show the ability of\n:func:`~imblearn.datasets.make_imbalance` of dealing with Pandas DataFrame.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: Dayvid Oliveira\n#          Christos Aridas\n#          Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# License: MIT"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\nimport seaborn as sns\n\nsns.set_context(\"poster\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Generate the dataset\n\nFirst, we will generate a dataset and convert it to a\n:class:`~pandas.DataFrame` with arbitrary column names. We will plot the\noriginal dataset.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\nimport pandas as pd\nfrom sklearn.datasets import make_moons\n\nX, y = make_moons(n_samples=200, shuffle=True, noise=0.5, random_state=10)\nX = pd.DataFrame(X, columns=[\"feature 1\", \"feature 2\"])\nax = X.plot.scatter(\n    x=\"feature 1\",\n    y=\"feature 2\",\n    c=y,\n    colormap=\"viridis\",\n    colorbar=False,\n)\nsns.despine(ax=ax, offset=10)\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Make a dataset imbalanced\n\nNow, we will show the helpers :func:`~imblearn.datasets.make_imbalance`\nthat is useful to random select a subset of samples. It will impact the\nclass distribution as specified by the parameters.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from collections import Counter\n\n\ndef ratio_func(y, multiplier, minority_class):\n    target_stats = Counter(y)\n    return {minority_class: int(multiplier * target_stats[minority_class])}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from imblearn.datasets import make_imbalance\n\nfig, axs = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))\n\nX.plot.scatter(\n    x=\"feature 1\",\n    y=\"feature 2\",\n    c=y,\n    ax=axs[0, 0],\n    colormap=\"viridis\",\n    colorbar=False,\n)\naxs[0, 0].set_title(\"Original set\")\nsns.despine(ax=axs[0, 0], offset=10)\n\nmultipliers = [0.9, 0.75, 0.5, 0.25, 0.1]\nfor ax, multiplier in zip(axs.ravel()[1:], multipliers):\n    X_resampled, y_resampled = make_imbalance(\n        X,\n        y,\n        sampling_strategy=ratio_func,\n        **{\"multiplier\": multiplier, \"minority_class\": 1},\n    )\n    X_resampled.plot.scatter(\n        x=\"feature 1\",\n        y=\"feature 2\",\n        c=y_resampled,\n        ax=ax,\n        colormap=\"viridis\",\n        colorbar=False,\n    )\n    ax.set_title(f\"Sampling ratio = {multiplier}\")\n    sns.despine(ax=ax, offset=10)\n\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}